NCHLT SiSwati POS tag set

Tag set

For purposes of annotators, this tag set is by and large taken over from Taljard et al. (2008) and various documents compiled by G. Faasz and U. Heid from the IMS, Stuttgart and D.J. Prinsloo and E. Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). The logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level (level 1) includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

The second level of annotation (level 2) includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

For disjunctive languages, next to all orthographic words, all linguistic words will also be tagged, resulting in two layers of POS annotation: one for all orthographic words and one for all linguistic words. For conjunctive languages, this extra layer of POS annotation is not needed.

The tagset currently distinguishes 20 categories applicable to Siswati and two different levels of annotation. However, only level 1 has been annotated. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

 

Tag

Explanation

PUNC

Punctuation

ABBR

Abbreviation (incl. acronyms)

ADJ

Adjective (incl. enumerative)

ADV

Adverb

CDEM

Class-indicating demonstrative

CONJ

Conjunction

COP

Copulative (copulative subject concord, demonstrative copulative, copulative verb)

FOR

Foreign

IDEO

Ideophone

INT

Interjection

INTER

Question word

N

Noun

NPP

Place and brand name

NUM

Numerative

POSS

Possessive (possessive concord, possessive pronoun)

PROEMP

Emphatic pronoun

PROQUANT

Quantitative pronoun

REL

Relative

V

Verbal

VAUX

Auxiliary verb

 

 

 

 

Tags not applicable to SiSwati

ASP

Aspectual marker

AUX

Auxiliary stem

CN

Class-indicating nominal prefix

CO

Class-indicating object concord

CS

Class-indicating subject concord

MNEG

Negative morpheme

PART

Particle

TENS

Tense marker

 


PUNCTUATION

Level 1: PUNC

Notes:

Examples:

;

PUNC

(

PUNC

!

PUNC

PUNC

 

ABBREVIATION

Level 1: ABBR

Notes:

Examples:

NGO

ABBR

njll

ABBR

 

ADJECTIVE

Level 1: ADJ01-11, ADJ14-15, ADJ01a, ADJ02a, ADJLOC

Notes:

Examples:

munye

ADJ01

labasha

ADJ02

kuletinye

ADJLOC

 

ADVERB

Level 1: ADV, ADVLOC

Notes:

Examples:

ngeke

ADV

ngaphandle

ADV

ekhatsi

ADVLOC

 

 [CLASS-INDICATING] DEMONSTRATIVE

Level 1: CDEM01-11, CDEM14-15, CDEMLOC

Notes:

Examples:

laba

CDEM02

leyo

CDEM04

kulelo

CDEMLOC

 

CONJUNCTION

Level 1: CONJ

Notes:

Examples:

futsi

CONJ

kepha

CONJ

 

COPULATIVE

Level 1: COP

Level 2: COP_neg, COP_nil

Notes:

(-be, - and –bilê). For the copulative verb stem –se  the tag COP_neg on level 2 is used, as is the case for the verb stem –be (<-ba) when it is used in the negative form.

Examples:

yincenye

COP

likwati

COP

 

FOREIGN

Level 1: FOR

Notes:

Examples:

development

FOR

planning

FOR

 

IDEOPHONE

Level 1: IDEO

 

 

Examples:

mbamba

IDEO

ngco

IDEO

 

INTERJECTION

Level 1: INT

Level 2: INT_neg, INT_nil

Notes:

Examples:

hhayi

INT

hawu

INT

 

INTERROGATIVES

Level 1: INTER

Level 2: _man, _time, _loc, _N01a, _N02a

Notes:

Examples:

nini

INTER

bani

INTER

 

NOUN

Level 1: N01-11, N14-15, N01a, N02a, NLOC, N00

Level 2: _aug, _dim, _loc, _name, _nil

Notes:

Examples:

cembu

N00

umuntfu

N01

bafati

N02

bomake

N02a

lizinga

N05

budlelwane

N14

edolobheni

NLOC

 

PLACE AND BRAND NAME

Level 1: NPP

Level 2: NPP_place, NPP_brand

Notes:

Examples:

KaZulu-Natali

NPP

Mars

NPP

 

NUMERATIVE

Level 1: NUM

Notes:

Examples:

2.2

NUM

74(a)

NUM

2005

NUM

 

POSSESSIVE

Level 1: POSS01-11, POSS14-15, POSSLOC, POSSPERS, POSSKA

Level 2: POSSPERS_1pl, POSSPERS_2pl

Notes:

 

Examples:

wahulumende

POSS03

yato

POSS04

lwakhe

POSS11

 

EMPHATIC PRONOUN

Level 1: PROEMP01-11, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

wena

PROEMP03

yona

PROEMP09

kuto

PROEMPLOC

 

QUANTITATIVE PRONOUN

Level 1: PROQUANT01-11, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

wonkhe

PROQUANT01

sonkhe

PROQUANT07

konkhe

PROQUANT15

 

RELATIVE

Level 1: REL

Notes:

Examples:

esimeni

REL

labacabene

REL

 

VERBAL

Level 1: V

Level 2: V_tr, V_itr, V_dtr

Notes:

Examples:

kubona

V

babelana

V

kuhlelwa

V

 

AUXILLIARY VERB

Level 1: VAUX

Level 2: VAUX_tr, VAUX-itr, VAUX_dtr

Notes:

Examples:

cishe

VAUX

kube

VAUX